0.1 Data Description Three types of data were recorded in London between November 2011 and February 2014
0.2 Weather Data Description Features recorded in 1 hour resolution:
0.3 Introducing Temporal Features
Extracted temporal features:
Encoding features on unit circle:
1.1 Selecting a Building: for 1 household, in half an hour resolution. We look for a building that has:
searching block 77 Best in the block: house id MAC000068 with maximum 17 consecutive zeros
1.2 Cleaning Data
Data starts at
2011-12-09
and ends at
2014-02-28
Only considering data collected from
2012-01-01
to
2013-12-31
Data has missing row in
2012-10-25
[22.]
2013-01-30
[1.5]
2013-04-22
[12.5]
2013-07-03
[10.5]
2013-08-02
[19.5]
2013-11-18
[12. 12.5 13. 13.5 14. 14.5 15. 15.5]
Added row for missing time steps
Duplicate rows:
energy date hourofd
16927 NaN 2012-12-18 15.0
All days have 48 records
Data contains 13 NaNs in columns
['energy']
Replaced by interpolation.
1.3 Add Temporal Factors
number of holidays 25.0
1.4 Add Weather Features
Check for missing dates in the hourly data:
Data has missing row in
2013-09-09
[23]
2013-09-10
[0]
Added row for missing time steps
Hourly weather contains 4 NaNs in rows:
temperature icon date hourofd
14831 NaN NaN 2013-09-09 23
14832 NaN NaN 2013-09-10 0
Hourly weather nans were interpolated
Missing dates of daily icons:
[datetime.date(2012, 4, 28), datetime.date(2012, 5, 3), datetime.date(2012, 10, 12), datetime.date(2012, 10, 19), datetime.date(2012, 10, 23), datetime.date(2012, 10, 26), datetime.date(2013, 4, 26), datetime.date(2013, 5, 21), datetime.date(2013, 8, 24), datetime.date(2013, 9, 6), datetime.date(2013, 9, 9)]
energy date hourofd hourofd_x hourofd_y weekday flag_weekend \
0 0.572 2012-01-01 0.0 0.000000 1.000000 Sunday True
1 0.910 2012-01-01 0.5 0.130526 0.991445 Sunday True
2 0.142 2012-01-01 1.0 0.258819 0.965926 Sunday True
3 0.696 2012-01-01 1.5 0.382683 0.923880 Sunday True
4 0.392 2012-01-01 2.0 0.500000 0.866025 Sunday True
dayofy dayofy_x dayofy_y daypart month season flg_holiday day_type \
0 0.0 0.0 1.0 night 1 winter False weekend
1 0.0 0.0 1.0 night 1 winter False weekend
2 0.0 0.0 1.0 night 1 winter False weekend
3 0.0 0.0 1.0 night 1 winter False weekend
4 0.0 0.0 1.0 night 1 winter False weekend
temperature_hourly icon_hourly icon_daily
0 12.12 partly-cloudy-night partly-cloudy-day
1 12.12 partly-cloudy-night partly-cloudy-day
2 12.59 cloudy partly-cloudy-day
3 12.59 cloudy partly-cloudy-day
4 12.45 partly-cloudy-night partly-cloudy-day
energy date hourofd hourofd_x hourofd_y weekday \
35083 1.270 2013-12-31 21.5 -0.608761 0.793353 Tuesday
35084 0.822 2013-12-31 22.0 -0.500000 0.866025 Tuesday
35085 0.626 2013-12-31 22.5 -0.382683 0.923880 Tuesday
35086 0.726 2013-12-31 23.0 -0.258819 0.965926 Tuesday
35087 0.630 2013-12-31 23.5 -0.130526 0.991445 Tuesday
flag_weekend dayofy dayofy_x dayofy_y daypart month season \
35083 False 364.0 -0.017213 0.999852 evening 12 winter
35084 False 364.0 -0.017213 0.999852 night 12 winter
35085 False 364.0 -0.017213 0.999852 night 12 winter
35086 False 364.0 -0.017213 0.999852 night 12 winter
35087 False 364.0 -0.017213 0.999852 night 12 winter
flg_holiday day_type temperature_hourly icon_hourly icon_daily
35083 False weekday 5.92 clear-night partly-cloudy-day
35084 False weekday 6.54 clear-night partly-cloudy-day
35085 False weekday 6.54 clear-night partly-cloudy-day
35086 False weekday 7.43 clear-night partly-cloudy-day
35087 False weekday 7.43 clear-night partly-cloudy-day
2.1 Plot monthly patterns
2.2 Box Plots
Text(0.5, 1.0, 'Weather Condition Box Plot')
2.3FFT
Rank Period (day + hours) Power Freq 0 1 0 days, 12.0 hours 5.025002e+07 0.083333 1 2 1 days, 0.0 hours 2.484073e+07 0.041667 2 3 365 days, 12.0 hours 5.622911e+06 0.000114 3 4 0 days, 5.0 hours 2.878192e+06 0.208333 4 5 1 days, 0.0 hours 1.730319e+06 0.041781 5 6 0 days, 6.0 hours 1.406628e+06 0.166667 6 7 1 days, 0.0 hours 1.378672e+06 0.041553 7 8 0 days, 12.0 hours 1.225793e+06 0.083447 8 9 0 days, 5.0 hours 1.124046e+06 0.208219 9 10 0 days, 5.0 hours 1.113250e+06 0.208447 10 11 0 days, 4.0 hours 5.647281e+05 0.250000 11 12 0 days, 3.5 hours 5.592615e+05 0.291667 12 13 0 days, 8.0 hours 5.316961e+05 0.124886 13 14 0 days, 12.0 hours 5.267863e+05 0.083219 14 15 0 days, 18.5 hours 4.397431e+05 0.053580 15 16 0 days, 3.5 hours 4.129859e+05 0.291781 16 17 3 days, 12.0 hours 3.017177e+05 0.011913 17 18 0 days, 10.5 hours 2.979761e+05 0.095246 18 19 0 days, 8.0 hours 2.901115e+05 0.125114 19 20 0 days, 12.0 hours 2.874928e+05 0.083561
2.3 Consumption vs. Time and Temperature
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1269: RuntimeWarning: invalid value encountered in sqrt return np.sqrt(self.var(ddof=ddof, **kwargs)) /opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1269: RuntimeWarning: invalid value encountered in sqrt return np.sqrt(self.var(ddof=ddof, **kwargs)) /opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py:1269: RuntimeWarning: invalid value encountered in sqrt return np.sqrt(self.var(ddof=ddof, **kwargs))
2.4 Consumption Heatmap vs. Time and Temperature Heating in the UK:
4.1 ACF: Autocorrelation with confidence bounds
4.2 PACF: The partial autocorrelation function (PACF) plot shows the amount of autocorrelation at lag k that is not explained by lower-order autocorrelations – The partial autocorrelation at lag k is the coefficient of LAG(Y,k) in an AR(k) model, i.e., in a regression of Y on LAG(Y, 1), LAG(Y,2), ... up to LAG(Y,k)
Observations:
Plot suggests that the following time steps are relevant:
Recurrence Plots
ADF Statistic: -7.783847 p-value: 0.000000 Critical Values: 1%: -3.435 5%: -2.864 10%: -2.568
KPSS Statistic: 5.8554645836527435 p-value: 0.01 num lags: 52 Critial Values: 10% : 0.347 5% : 0.463 2.5% : 0.574 1% : 0.739 Result: The series is not stationary
/opt/anaconda3/lib/python3.8/site-packages/statsmodels/tsa/stattools.py:1661: FutureWarning: The behavior of using lags=None will change in the next release. Currently lags=None is the same as lags='legacy', and so a sample-size lag length is used. After the next release, the default will change to be the same as lags='auto' which uses an automatic lag length selection method. To silence this warning, either use 'auto' or 'legacy'
warn(msg, FutureWarning)
/opt/anaconda3/lib/python3.8/site-packages/statsmodels/tsa/stattools.py:1685: InterpolationWarning: p-value is smaller than the indicated p-value
warn("p-value is smaller than the indicated p-value", InterpolationWarning)